Perform reachability analysis on a per-harness basis #2439

celinval · 2023-05-12T17:45:49Z

Description of changes:

Kani compiler used to generate one goto-program for all harnesses in one crate. In some cases, this actually had a negative impact on the harness verification time. This was first reported in #1659, and it is now blocking the toolchain upgrade from #2406.

The main changes were done in the compiler's module compiler_interface and the module project from the driver. The compiler will now gather all the harnesses beforehand and it will perform reachability + codegen steps for each harness. All files related to a harness goto-program will follow the naming convention bellow:

<BASE_NAME>_<MANGLED_NAME>.<EXTENSION>

This applies to symtab / goto / type_map / restriction files.

The metadata file is still generated once per target crate, and its name is still the same (<BASE_NAME>.kani-metadata.json).

On the driver side, the way we process the artifacts have changed. The driver will now read the metadata for each crate, and collect all artifacts based on the symtab goto file that is recorded in the metadata of each harness.

These changes do not apply for --function. We still keep all artifacts based on the crate's <BASE_NAME> and we have a separate logic to handle that. Fixing this is captured by #2129.

Resolved issues:

Resolves #1855

Related RFC:

Call-outs:

There are a few of TODOs in this code that I left to avoid making this PR even bigger. I'll try to cross them out in the next couple of weeks. The only one that I think is somewhat urgent is the one in the compiler_interface.rs about making sure the test description and mono item align. Although this logic works today, it is rather fragile.

Unfortunately, this PR will impact compilation time. We could try to optimize this logic, but any optimization I can think of will make this PR much bigger. Also, the best optimization might just be improving our goto codegen performance in general.

Testing:

How is this change tested?
Is this a refactor change? Yes

Checklist

Each commit message has a non-empty body, explaining why the change was made
Methods or procedures are documented
Regression or unit tests are included, or existing tests cover the modified code
My PR is restricted to a single feature or bugfix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

Still missing: - cargo kani - other reachability modes

celinval · 2023-05-12T20:14:12Z

I still need to add the test I've been working on, but the code is ready for review.

celinval · 2023-05-12T21:21:59Z

I still need to add the test I've been working on, but the code is ready for review.

Done!

Conflicts: kani-compiler/src/codegen_cprover_gotoc/compiler_interface.rs

zhassan-aws

Here's an initial set of comments. I'm having difficulty wrapping my head around how project, target crate, harness, etc are structured or related.

The driver will now read the metadata for each crate, and collect all artifacts based on the symtab goto file that is recorded in the metadata of each harness.

Does each crate metadata file contain metadata for each harness in that crate?

tests/expected/per-harness/drop.rs

kani-driver/src/call_cbmc_viewer.rs

kani-driver/src/session.rs

tests/expected/per-harness/drop.rs

kani-driver/src/project.rs

celinval · 2023-05-14T00:33:03Z

Here's an initial set of comments. I'm having difficulty wrapping my head around how project, target crate, harness, etc are structured or related.

The driver will now read the metadata for each crate, and collect all artifacts based on the symtab goto file that is recorded in the metadata of each harness.

Does each crate metadata file contain metadata for each harness in that crate?

Yes.That actually didn't change in this PR at all. It was a previous change.

Each crate has one metadata. Each crate metadata has a vector with harnesses metadata. Each harness metadata contains the name of a goto file.

Before this change, the name of the goto file was redundant since all harnesses metadata pointed to the same goto file. With this change, we extract the goto file for each harness from the metadata.

celinval · 2023-05-14T00:50:02Z

Btw, the crate metadata is defined here:

kani/kani_metadata/src/lib.rs

Lines 20 to 30 in 8d79ee5

    
           pub struct KaniMetadata { 
        
               /// The crate name from which this metadata was extracted. 
        
               pub crate_name: String, 
        
               /// The proof harnesses (`#[kani::proof]`) found in this crate. 
        
               pub proof_harnesses: Vec<HarnessMetadata>, 
        
               /// The features found in this crate that Kani does not support. 
        
               /// (These general translate to `assert(false)` so we can still attempt verification.) 
        
               pub unsupported_features: Vec<UnsupportedFeature>, 
        
               /// If crates are built in test-mode, then test harnesses will be recorded here. 
        
               pub test_harnesses: Vec<HarnessMetadata>, 
        
           }

It holds metadata for proof and test harnesses. In each harness metadata, we store the path to the model here:

kani/kani_metadata/src/harness.rs

Lines 23 to 24 in 8d79ee5

    
           /// Optional modeling file that was generated by the compiler that includes this harness. 
        
           pub goto_file: Option<PathBuf>,

That said, I was hoping we could change that field to be a vector of artifacts instead. It would simplify the driver and also be engine agnostic.

zhassan-aws

Thanks @celinval! Can we evaluate the impact of this change on the compilation time of a big project (e.g. s2n-quic)? The verification time numbers look great, but I wanted to make sure compilation time is not severely impacted.

kani-driver/src/call_cbmc_viewer.rs

celinval · 2023-05-15T19:57:46Z

Thanks @celinval! Can we evaluate the impact of this change on the compilation time of a big project (e.g. s2n-quic)? The verification time numbers look great, but I wanted to make sure compilation time is not severely impacted.

Totally! That's exactly what I've been doing now. I noticed that the "Kani CI / perf" job finished rather quickly, so I'm hoping the overall time is still better. That said, the compilation time does become significant and it increases the need for optimizing the compiler.

Unrelated to this change, I've also noticed that Kani driver is starting to consume a lot of memory when verifying the s2n-quic-core crate.

celinval · 2023-05-15T22:06:52Z

I manually executed s2n-quic to get the overall execution time and compilation times using a release build. I compared the head of this PR (fd0075c) and main (7c4400d):

	Main	PR
compilation time	21s	35s
overall time	12min 30s	8min 24s

I think overall this change is still fairly beneficial for our users. I suggest that we prioritize compiler performance improvements to a follow up PR. I created an issue for adding more data to benchcomp and to investigate optimizations.

zhassan-aws · 2023-05-15T22:10:16Z

Nice! Ship it!

This fixes a regression introduced in #2439 when write symtab json is enabled. We still need to take that into consideration and remove them if needed. This change also simplifies the write symtab json regression to avoid the out of disk space issue we've been seeing since #2439.

This fixes a regression from #2439. The compiler should store the location of the function body instead of the declaration. Storing the correct location fixes how concrete playback stores the generated unit test.

This was a regression introduced by #2439. We were still writing the result of the reachability algorithm to the same file for every harness. Thus, we could only see the MIR for the last harness that was processed. Use the file name that is specific for the harness instead and generate one MIR file per harness like we do with other files generated by kani-compiler.

celinval added 3 commits May 12, 2023 10:35

Implement per-harness for standalone kani

ef87e9c

Still missing: - cargo kani - other reachability modes

Implement support for pub_fns + tests reachability

2a68a21

Implement the cargo kani logic

817ea7e

celinval marked this pull request as ready for review May 12, 2023 20:13

celinval requested a review from a team as a code owner May 12, 2023 20:13

Add a test

8917a0a

Merge remote-tracking branch 'origin/main' into issue-1855-per-harness-2

bbf6a63

Conflicts: kani-compiler/src/codegen_cprover_gotoc/compiler_interface.rs

zhassan-aws reviewed May 13, 2023

View reviewed changes

zhassan-aws reviewed May 15, 2023

View reviewed changes

kani-driver/src/call_cbmc_viewer.rs Show resolved Hide resolved

celinval added 2 commits May 15, 2023 14:38

Merge remote-tracking branch 'origin/main' into issue-1855-per-harness-2

fd0075c

Address feedback + improve test

85605f1

zhassan-aws approved these changes May 15, 2023

View reviewed changes

Merge branch 'main' into issue-1855-per-harness-2

441d13c

celinval enabled auto-merge (squash) May 16, 2023 00:14

celinval merged commit 2a09d79 into model-checking:main May 16, 2023

celinval mentioned this pull request May 16, 2023

Fix symtab json file removal and reduce regression scope #2447

Merged

4 tasks

celinval mentioned this pull request May 19, 2023

Fix regression on concrete playback inplace #2454

Merged

4 tasks

zhassan-aws mentioned this pull request May 25, 2023

Avoid an unnecssary suffix to the goto filename #2472

Merged

4 tasks

celinval mentioned this pull request Jun 22, 2023

Fix MIR dump to emit one MIR per-harness #2556

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform reachability analysis on a per-harness basis #2439

Perform reachability analysis on a per-harness basis #2439

celinval commented May 12, 2023 •

edited

Loading

celinval commented May 12, 2023

celinval commented May 12, 2023

zhassan-aws left a comment

celinval commented May 14, 2023

celinval commented May 14, 2023 •

edited

Loading

zhassan-aws left a comment

celinval commented May 15, 2023 •

edited

Loading

celinval commented May 15, 2023 •

edited

Loading

zhassan-aws commented May 15, 2023

Perform reachability analysis on a per-harness basis #2439

Perform reachability analysis on a per-harness basis #2439

Conversation

celinval commented May 12, 2023 • edited Loading

Description of changes:

Resolved issues:

Related RFC:

Call-outs:

Testing:

Checklist

celinval commented May 12, 2023

celinval commented May 12, 2023

zhassan-aws left a comment

Choose a reason for hiding this comment

celinval commented May 14, 2023

celinval commented May 14, 2023 • edited Loading

zhassan-aws left a comment

Choose a reason for hiding this comment

celinval commented May 15, 2023 • edited Loading

celinval commented May 15, 2023 • edited Loading

zhassan-aws commented May 15, 2023

celinval commented May 12, 2023 •

edited

Loading

celinval commented May 14, 2023 •

edited

Loading

celinval commented May 15, 2023 •

edited

Loading

celinval commented May 15, 2023 •

edited

Loading